Food & Environment 🌿¶

Over the past few years, concerns about the environmental impact of animal agriculture have grown. It is important to understand how our food choices affect the planet. By analyzing data on the environmental impacts of different types of food, we can gain valuable insights into how these products affect the environment.

These concerns are the driving force behind this analysis. In particular, animal agriculture is a significant contributor to greenhouse gas emissions. It is responsible for global methane and nitrous oxide emissions, which are powerful greenhouse gases. In addition, animal agriculture also uses large amounts of water and land.

To carry out this analysis, we will use Pandas for data manipulation and analysis, Seaborn for data visualization, Matplotlib for creating custom plots, SciPy for statistical tests and analysis, and SciKit-Learn for machine learning.

This tutorial will guide users through a typical data science pipeline, while also providing information about the issue of animal agriculture on the environment and why it is a pressing problem.

Data Collection¶

Initially, we must collect relevant data from the internet to conduct our analysis:

  1. Environmental Impact of Food Production
  2. Meat Consumption

We will download each respective CSV file and then use Pandas to read and manipulate them accordingly to set up for data management.

In [57]:
import pandas as pd # Import Pandas library

CSV1 = 'Food_Production.csv'  # Initialize CSV file containing environmental impact of food production data
FOOD_PRODUCTS = ["Bread", "Maize", "Barley",
                 "Oatmeal", "Rice", "Potato",
                 "Cassava", "Sugar", "Beet Sugar",
                 "Pulse", "Pea", "Nut",
                 "Groundnut", "Soymilk", "Tofu",
                 "Soybean Oil", "Palm Oil", "Sunflower Oil",
                 "Rapeseed Oil", "Olive Oil", "Tomato",
                 "Onion & Leek", "Root Veg.", "Brassica",
                 "Veg.", "Citrus Fruit", "Banana",
                 "Apple", "Berry & Grape", "Wine",
                 "Fruit", "Coffee", "Dark Chocolate",
                 "Beef", "Dairy Beef", "Lamb",
                 "Pig", "Poultry", "Milk",
                 "Cheese", "Egg", "Fish", "Shrimp"] # Create list of abbreviations for clearer graph labels (current ones are too long)

df1 = pd.read_csv(CSV1) # Read CSV file and store as a Pandas dataframe
df1.loc[:, "Food product"] = FOOD_PRODUCTS # Replace column in the dataframe called "Food product" and initialize with abbreviations
df1.head() # Print the first few entries of the dataframe
Out[57]:
Food product Land use change Animal Feed Farm Processing Transport Packging Retail Total_emissions Eutrophying emissions per 1000kcal (gPO₄eq per 1000kcal) ... Freshwater withdrawals per 100g protein (liters per 100g protein) Freshwater withdrawals per kilogram (liters per kilogram) Greenhouse gas emissions per 1000kcal (kgCO₂eq per 1000kcal) Greenhouse gas emissions per 100g protein (kgCO₂eq per 100g protein) Land use per 1000kcal (m² per 1000kcal) Land use per kilogram (m² per kilogram) Land use per 100g protein (m² per 100g protein) Scarcity-weighted water use per kilogram (liters per kilogram) Scarcity-weighted water use per 100g protein (liters per 100g protein) Scarcity-weighted water use per 1000kcal (liters per 1000 kilocalories)
0 Bread 0.1 0.0 0.8 0.2 0.1 0.1 0.1 1.4 NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 Maize 0.3 0.0 0.5 0.1 0.1 0.1 0.0 1.1 NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 Barley 0.0 0.0 0.2 0.1 0.0 0.5 0.3 1.1 NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
3 Oatmeal 0.0 0.0 1.4 0.0 0.1 0.1 0.0 1.6 4.281357 ... 371.076923 482.4 0.945482 1.907692 2.897446 7.6 5.846154 18786.2 14450.92308 7162.104461
4 Rice 0.0 0.0 3.6 0.1 0.1 0.1 0.1 4.0 9.514379 ... 3166.760563 2248.4 1.207271 6.267606 0.759631 2.8 3.943662 49576.3 69825.77465 13449.891480

5 rows × 23 columns

Now that we have created our first dataframe containing the first dataset, let's create the next one.

In [58]:
CSV2 = 'meat_consumption_worldwide.csv' # Initialize CSV file containing meat consumption worldwide data

df2 = pd.read_csv(CSV2) # Read CSV file and store as a Pandas dataframe
df2.drop(df2.loc[df2["LOCATION"] == "WLD"].index, inplace = True) # Remove WLD (global) from the dataframe, since we only want to look at individual countries
df2.head() # Print the first few entries of the dataframe
Out[58]:
LOCATION SUBJECT MEASURE TIME Value
0 AUS BEEF KG_CAP 1991 27.721815
1 AUS BEEF KG_CAP 1992 26.199591
2 AUS BEEF KG_CAP 1993 26.169094
3 AUS BEEF KG_CAP 1994 25.456134
4 AUS BEEF KG_CAP 1995 25.340226

We are done reading our data, so we can continue to the next stage where we will deal with the NaN values.

Data Management¶

To ensure our data can be used coherently in a representation through graphs, we must manage it by creating new dataframes that can be easily graphed using Seaborn.

We will split df1 into two dataframes: df3 and df4. df3 will contain the first 9 columns, which are data calculated relative to each other. df4 will contain all the columns after, which are data calculated by a certain metric. This will be clearer when we graph the data.

In [59]:
END = 9 # Set the end range

df3 = df1.iloc[:, :END] # Keep the first 9 columns and store it as df1
df3.head() # Print the first few entries of the dataframe
Out[59]:
Food product Land use change Animal Feed Farm Processing Transport Packging Retail Total_emissions
0 Bread 0.1 0.0 0.8 0.2 0.1 0.1 0.1 1.4
1 Maize 0.3 0.0 0.5 0.1 0.1 0.1 0.0 1.1
2 Barley 0.0 0.0 0.2 0.1 0.0 0.5 0.3 1.1
3 Oatmeal 0.0 0.0 1.4 0.0 0.1 0.1 0.0 1.6
4 Rice 0.0 0.0 3.6 0.1 0.1 0.1 0.1 4.0
In [60]:
START = 9 # Set the start range
END = 23 # Set the end range
RANGE = [0] + list(range(START, END)) # Set the range of columns we want

df4 = df1.copy() # Create a copy to prevent warning
df4 = df4.iloc[:, RANGE] # Keep the columns after the 9th
df4.dropna(inplace = True) # Remove all NaNs since they don't provide any data
df4.reset_index(drop = True, inplace = True) # Reset our index for clarity
df4.head() # Print the first few entries of the dataframe
Out[60]:
Food product Eutrophying emissions per 1000kcal (gPO₄eq per 1000kcal) Eutrophying emissions per kilogram (gPO₄eq per kilogram) Eutrophying emissions per 100g protein (gPO₄eq per 100 grams protein) Freshwater withdrawals per 1000kcal (liters per 1000kcal) Freshwater withdrawals per 100g protein (liters per 100g protein) Freshwater withdrawals per kilogram (liters per kilogram) Greenhouse gas emissions per 1000kcal (kgCO₂eq per 1000kcal) Greenhouse gas emissions per 100g protein (kgCO₂eq per 100g protein) Land use per 1000kcal (m² per 1000kcal) Land use per kilogram (m² per kilogram) Land use per 100g protein (m² per 100g protein) Scarcity-weighted water use per kilogram (liters per kilogram) Scarcity-weighted water use per 100g protein (liters per 100g protein) Scarcity-weighted water use per 1000kcal (liters per 1000 kilocalories)
0 Oatmeal 4.281357 11.23 8.638462 183.911552 371.076923 482.4 0.945482 1.907692 2.897446 7.60 5.846154 18786.2 14450.92308 7162.104461
1 Rice 9.514379 35.07 49.394366 609.983722 3166.760563 2248.4 1.207271 6.267606 0.759631 2.80 3.943662 49576.3 69825.77465 13449.891480
2 Potato 4.754098 3.48 20.470588 80.737705 347.647059 59.1 0.628415 2.705882 1.202186 0.88 5.176471 2754.2 16201.17647 3762.568306
3 Nut 3.113821 19.15 11.726883 672.162602 2531.414574 4133.8 0.069919 0.263319 2.107317 12.96 7.936314 229889.8 140777.58730 37380.455280
4 Groundnut 2.437931 14.14 5.401070 319.362069 707.524828 1852.3 0.556897 1.233766 1.570690 9.11 3.479756 61797.9 23605.00382 10654.810340
In [61]:
PLANT_PRODUCTS = {"Bread", "Maize", "Barley",
                 "Oatmeal", "Rice", "Potato",
                 "Cassava", "Sugar", "Beet Sugar",
                 "Pulse", "Pea", "Nut",
                 "Groundnut", "Soymilk", "Tofu",
                 "Soybean Oil", "Palm Oil", "Sunflower Oil",
                 "Rapeseed Oil", "Olive Oil", "Tomato",
                 "Onion & Leek", "Root Veg.", "Brassica",
                 "Veg.", "Citrus Fruit", "Banana",
                 "Apple", "Berry & Grape", "Wine",
                 "Fruit", "Coffee", "Dark Chocolate"}

ANIMAL_PRODUCTS = {"Beef", "Dairy Beef", "Lamb",
                   "Pig", "Poultry", "Milk",
                   "Cheese", "Egg", "Fish", "Shrimp"}

df5 = df3.copy()
df6 = df4.copy()

df5['Food type'] = None
df6['Food type'] = None

for i, r in df5.iterrows():
  if r['Food product'] in PLANT_PRODUCTS:
    df5.loc[i, 'Food type'] = 'Plant'
  elif r['Food product'] in ANIMAL_PRODUCTS:
    df5.loc[i, 'Food type'] = 'Animal'

for i, r in df6.iterrows():
  if r['Food product'] in PLANT_PRODUCTS:
    df6.loc[i, 'Food type'] = 'Plant'
  elif r['Food product'] in ANIMAL_PRODUCTS:
    df6.loc[i, 'Food type'] = 'Animal'

df5 = df5.groupby("Food type").mean()
df5["Food type"] = df5.index
df5.reset_index(drop = True, inplace = True)
df5
Out[61]:
Land use change Animal Feed Farm Processing Transport Packging Retail Total_emissions Food type
0 2.810000 1.95 10.490000 0.500000 0.240000 0.220000 0.180000 16.390000 Animal
1 0.790909 0.00 1.342424 0.178788 0.181818 0.284848 0.036364 2.815152 Plant
In [62]:
df6 = df6.groupby("Food type").mean()
df6["Food type"] = df6.index
df6.reset_index(drop = True, inplace = True)
df6
Out[62]:
Eutrophying emissions per 1000kcal (gPO₄eq per 1000kcal) Eutrophying emissions per kilogram (gPO₄eq per kilogram) Eutrophying emissions per 100g protein (gPO₄eq per 100 grams protein) Freshwater withdrawals per 1000kcal (liters per 1000kcal) Freshwater withdrawals per 100g protein (liters per 100g protein) Freshwater withdrawals per kilogram (liters per kilogram) Greenhouse gas emissions per 1000kcal (kgCO₂eq per 1000kcal) Greenhouse gas emissions per 100g protein (kgCO₂eq per 100g protein) Land use per 1000kcal (m² per 1000kcal) Land use per kilogram (m² per kilogram) Land use per 100g protein (m² per 100g protein) Scarcity-weighted water use per kilogram (liters per kilogram) Scarcity-weighted water use per 100g protein (liters per 100g protein) Scarcity-weighted water use per 1000kcal (liters per 1000 kilocalories) Food type
0 58.085221 139.423333 73.289922 906.822192 1230.689663 2102.944444 10.436802 14.490724 34.723544 97.806667 51.590835 70855.533333 41650.170546 28049.055284 Animal
1 23.109872 20.742667 49.746990 422.777740 1728.610704 711.420000 5.741551 14.587478 5.421294 8.788667 19.385191 28073.860000 76078.624420 15746.276693 Plant

Next, we will split df2 into three dataframes: df7, df8, and df9. df7 will contain 2022 data on relative meat consumption per country. df8 will contain data on how each country's meat consumption changes every year. df9 will contain data on how all countries' meat consumption changes every year.

In [63]:
df7 = df2.loc[df2["TIME"] == 2022] # Filter to contain only 2022 data and store it as df7
df7.reset_index(drop = True, inplace = True) # Reset our index for clarity
df7.head() # Print the first few entries of the dataframe
Out[63]:
LOCATION SUBJECT MEASURE TIME Value
0 AUS BEEF KG_CAP 2022 19.951395
1 AUS PIG KG_CAP 2022 20.795147
2 AUS POULTRY KG_CAP 2022 45.974985
3 AUS SHEEP KG_CAP 2022 8.606749
4 CAN BEEF KG_CAP 2022 18.426918

We should group the data by location since we want to observe the total meat consumption per country in our visualization.

In [64]:
df7 = df7.groupby("LOCATION").mean() # Group the data by location and average other values
df7["LOCATION"] = df7.index # Add "LOCATION" back as a column (after it's removed by groupby)
df7.reset_index(drop = True, inplace = True) # Reset the index and remove the "LOCATION" index
df7.drop(columns = ["TIME"], inplace = True) # Drop "TIME" since we're only looking at 2022
df7.head() # Print the first few entries of the dataframe
Out[64]:
Value LOCATION
0 654.438777 ARG
1 395.792480 AUS
2 92.761958 BGD
3 2742.902830 BRA
4 16958.468271 BRICS

Since we aim to visualize this data, we need to come up with a solid range of values to separate relative meat consumption into categories so we can clearly see which countries consume more meat than others.

Let's sort this data and view it all.

In [65]:
df7.sort_values("Value")
Out[65]:
Value LOCATION
15 28.316855 HTI
24 40.967186 MOZ
14 48.032780 GHA
46 48.279836 ZMB
42 58.287311 URY
28 63.785447 NZL
33 72.225519 PRY
40 77.438105 TZA
12 84.938557 ETH
2 92.761958 BGD
27 97.187500 NOR
19 119.053703 ISR
10 119.396307 DZA
36 119.653849 SDN
21 138.128745 KAZ
6 149.544583 CHE
26 199.659620 NGA
7 224.346100 CHL
38 247.684181 THA
31 247.995524 PER
25 279.695257 MYS
35 286.645753 SAU
41 304.334280 UKR
11 331.591675 EGY
9 357.508500 COL
18 378.662937 IRN
39 395.481716 TUR
1 395.792480 AUS
5 421.909062 CAN
45 443.961610 ZAF
30 451.054780 PAK
22 473.225893 KOR
16 498.350871 IDN
32 524.246670 PHL
0 654.438777 ARG
17 709.425892 IND
20 720.685672 JPN
44 893.342928 VNM
23 1044.307995 MEX
34 1393.345149 RUS
37 1468.064750 SSA
3 2742.902830 BRA
43 5193.676467 USA
13 5500.365924 EU28
8 11695.816209 CHN
29 14593.153328 OECD
4 16958.468271 BRICS

From this, we can intuitively suggest the following range values: 0, 100, 500, 1000, 5000, 10000, and 20000.

Next, we can create df6 and df7 accordingly.

In [66]:
df8 = df2.groupby(["LOCATION", "TIME"]).mean()
df8.reset_index(inplace = True)
df8.head()
Out[66]:
LOCATION TIME Value
0 ARG 1990 524.747862
1 ARG 1991 393.885581
2 ARG 1992 420.682693
3 ARG 1993 447.727484
4 ARG 1994 436.872628
In [67]:
df9 = df2.groupby("TIME").mean()
df9.reset_index(inplace = True)
df9.head()
Out[67]:
TIME Value
0 1990 506.459977
1 1991 616.501783
2 1992 689.963810
3 1993 794.275811
4 1994 837.991369

Data Representation¶

In [68]:
import seaborn as sns, matplotlib.pyplot as plt

sns.set(rc = {'figure.figsize': (25, 25)})

_, ((ax1, ax2), (ax3, ax4)) = plt.subplots(nrows = 2, ncols = 2)

p1 = sns.pointplot(x = "Food product", y = "Land use change", color = "red", data = df3, ax = ax1)
p1.set(xlabel = "FOOD PRODUCT", ylabel = "LAND USE CHANGE")
p1.set_xticklabels(p1.get_xticklabels(), rotation = 90)

p2 = sns.pointplot(x = "Food product", y = "Animal Feed", color = "green", data = df3, ax = ax2)
p2.set(xlabel = "FOOD PRODUCT", ylabel = "ANIMAL FEED")
p2.set_xticklabels(p2.get_xticklabels(), rotation = 90)

p3 = sns.pointplot(x = "Food product", y = "Transport", color = "blue", data = df3, ax = ax3)
p3.set(xlabel = "FOOD PRODUCT", ylabel = "TRANSPORT")
p3.set_xticklabels(p3.get_xticklabels(), rotation = 90)

p4 = sns.pointplot(x = "Food product", y = "Total_emissions", color = "yellow", data = df3, ax = ax4)
p4.set(xlabel = "FOOD PRODUCT", ylabel = "TOTAL EMISSIONS")
p4.set_xticklabels(p4.get_xticklabels(), rotation = 90)

ax1; ax2; ax3; ax4
Out[68]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f1c58b48160>
In [69]:
_, ((ax1, ax2), (ax3, ax4)) = plt.subplots(nrows = 2, ncols = 2)

p1 = sns.barplot(x = "Food product", y = "Eutrophying emissions per 1000kcal (gPO₄eq per 1000kcal)", data = df4, ax = ax1)
p1.set(xlabel = "FOOD PRODUCT", ylabel = "EUTROPHYING EMISSIONS (gPO₄eq per 1000kcal)")
p1.set_xticklabels(p1.get_xticklabels(), rotation = 90)

p2 = sns.barplot(x = "Food product", y = "Freshwater withdrawals per 1000kcal (liters per 1000kcal)", data = df4, ax = ax2)
p2.set(xlabel = "FOOD PRODUCT", ylabel = "WATER USE (liters per 1000kcal)")
p2.set_xticklabels(p2.get_xticklabels(), rotation = 90)

p3 = sns.barplot(x = "Food product", y = "Greenhouse gas emissions per 1000kcal (kgCO₂eq per 1000kcal)", data = df4, ax = ax3)
p3.set(xlabel = "FOOD PRODUCT", ylabel = "GREENHOUSE GAS EMISSIONS (kgCO₂eq per 1000kcal)")
p3.set_xticklabels(p3.get_xticklabels(), rotation = 90)

p4 = sns.barplot(x = "Food product", y = "Land use per 1000kcal (m² per 1000kcal)", data = df4, ax = ax4)
p4.set(xlabel = "FOOD PRODUCT", ylabel = "LAND USE (m² per 1000kcal)")
p4.set_xticklabels(p4.get_xticklabels(), rotation = 90)

ax1; ax2; ax3; ax4
Out[69]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f1c586dd1c0>
In [70]:
import folium
from branca.element import Figure

fig = Figure(width = 1250, height = 750)
world_map = folium.Map(location = [42.5, 0], zoom_start = 2)

folium.Choropleth(
    geo_data = 'https://raw.githubusercontent.com/datasets/geo-countries/master/data/countries.geojson',
    data = df7,
    columns = ['LOCATION', 'Value'],
    key_on = 'properties.ISO_A3',  
    fill_color = 'YlOrRd',
    legend_name = "Relative Meat Consumption",
    threshold_scale = [0, 100, 500, 1000, 5000, 10000, 20000]
).add_to(world_map)

fig.add_child(world_map)
fig
Out[70]:

Data Analysis¶

In [71]:
sns.set(rc = {'figure.figsize': (20, 20)})

_, ((ax1, ax2), (ax3, ax4)) = plt.subplots(nrows = 2, ncols = 2)

p1 = sns.pointplot(x = "Food type", y = "Land use change", color = "red", data = df5, ax = ax1)
p1.set(xlabel = "FOOD TYPE", ylabel = "LAND USE CHANGE")
p1.set_xticklabels(p1.get_xticklabels(), rotation = 90)

p2 = sns.pointplot(x = "Food type", y = "Animal Feed", color = "green", data = df5, ax = ax2)
p2.set(xlabel = "FOOD TYPE", ylabel = "ANIMAL FEED")
p2.set_xticklabels(p2.get_xticklabels(), rotation = 90)

p3 = sns.pointplot(x = "Food type", y = "Transport", color = "blue", data = df5, ax = ax3)
p3.set(xlabel = "FOOD TYPE", ylabel = "TRANSPORT")
p3.set_xticklabels(p3.get_xticklabels(), rotation = 90)

p4 = sns.pointplot(x = "Food type", y = "Total_emissions", color = "yellow", data = df5, ax = ax4)
p4.set(xlabel = "FOOD TYPE", ylabel = "TOTAL EMISSIONS")
p4.set_xticklabels(p4.get_xticklabels(), rotation = 90)

ax1; ax2; ax3; ax4
Out[71]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f1c587a4d30>
In [72]:
_, ((ax1, ax2), (ax3, ax4)) = plt.subplots(nrows = 2, ncols = 2)

p1 = sns.barplot(x = "Food type", y = "Eutrophying emissions per 1000kcal (gPO₄eq per 1000kcal)", data = df6, ax = ax1)
p1.set(xlabel = "FOOD TYPE", ylabel = "EUTROPHYING EMISSIONS (gPO₄eq per 1000kcal)")
p1.set_xticklabels(p1.get_xticklabels(), rotation = 90)

p2 = sns.barplot(x = "Food type", y = "Freshwater withdrawals per 1000kcal (liters per 1000kcal)", data = df6, ax = ax2)
p2.set(xlabel = "FOOD TYPE", ylabel = "WATER USE (liters per 1000kcal)")
p2.set_xticklabels(p2.get_xticklabels(), rotation = 90)

p3 = sns.barplot(x = "Food type", y = "Greenhouse gas emissions per 1000kcal (kgCO₂eq per 1000kcal)", data = df6, ax = ax3)
p3.set(xlabel = "FOOD TYPE", ylabel = "GREENHOUSE GAS EMISSIONS (kgCO₂eq per 1000kcal)")
p3.set_xticklabels(p3.get_xticklabels(), rotation = 90)

p4 = sns.barplot(x = "Food type", y = "Land use per 1000kcal (m² per 1000kcal)", data = df6, ax = ax4)
p4.set(xlabel = "FOOD TYPE", ylabel = "LAND USE (m² per 1000kcal)")
p4.set_xticklabels(p4.get_xticklabels(), rotation = 90)

ax1; ax2; ax3; ax4
Out[72]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f1c52bbf160>
In [73]:
p = sns.scatterplot(x = "TIME", y = "Value", data = df8)
p.set(xlabel = "YEAR", ylabel = "MEAT CONSUMPTION")
p
Out[73]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f1c52ad2fd0>
In [74]:
p = sns.scatterplot(x = "TIME", y = "Value", hue = "LOCATION", data = df8)
p.set(xlabel = "YEAR", ylabel = "MEAT CONSUMPTION")
p
Out[74]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f1c52c397c0>
In [75]:
p = sns.lmplot(x = "TIME", y = "Value", hue = "LOCATION", data = df8, height = 20)
p.set(xlabel = "YEAR", ylabel = "MEAT CONSUMPTION")
p
Out[75]:
<seaborn.axisgrid.FacetGrid at 0x7f1c52917850>
In [76]:
p = sns.lineplot(x = "TIME", y = "Value", data = df9)
p.set(xlabel = "YEAR", ylabel = "MEAT CONSUMPTION")
p
Out[76]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f1c4ff0b250>
In [77]:
from sklearn.linear_model import LinearRegression 

lr = LinearRegression() 
lr.fit(df8[["TIME"]].values, df8[["Value"]].values) 

p = sns.scatterplot(x = "TIME", y = "Value", data = df8) 
p.plot(df8[["TIME"]], lr.predict(df8[["TIME"]].values), color = "red") 
p.set(xlabel = "YEAR", ylabel = "MEAT CONSUMPTION")

print(f"y = {lr.coef_[0][0]}x + {lr.intercept_[0]}") 
y = 23.988643727511615x + -46952.89307042947

Hypothesis Testing¶

In [78]:
from scipy.stats import ranksums
import numpy as np

animal_values = df5.iloc[0][["Land use change", "Animal Feed", "Transport", "Total_emissions"]].tolist() + df6.iloc[0][["Eutrophying emissions per 1000kcal (gPO₄eq per 1000kcal)", "Freshwater withdrawals per 1000kcal (liters per 1000kcal)", "Greenhouse gas emissions per 1000kcal (kgCO₂eq per 1000kcal)", "Land use per 1000kcal (m² per 1000kcal)"]].tolist()
plant_values = df5.iloc[1][["Land use change", "Animal Feed", "Transport", "Total_emissions"]].tolist() + df6.iloc[1][["Eutrophying emissions per 1000kcal (gPO₄eq per 1000kcal)", "Freshwater withdrawals per 1000kcal (liters per 1000kcal)", "Greenhouse gas emissions per 1000kcal (kgCO₂eq per 1000kcal)", "Land use per 1000kcal (m² per 1000kcal)"]].tolist()

for i in range(8):
  animal_values[i] = (animal_values[i] + 0.1) / (plant_values[i] + 0.1)
  plant_values[i] = 1

_, p = ranksums(animal_values, plant_values)

if p < 0.05:
    print("The means are significantly different.")
else:
    print("The means are not significantly different.")
The means are significantly different.
In [80]:
pred_2000 = lr.predict([[2000]])[0][0]
pred_2020 = lr.predict([[2020]])[0][0]

diff = abs(pred_2000 - pred_2020)
std = df8["Value"].std()
significance_level = 0.05

if diff > std * significance_level:
    print("Alt Hyp: The difference is significant.")
else:
    print("Nul Hyp: The difference is not significant.")
Alt Hyp: The difference is significant.

Communication of Insights Attained¶